Lets first set some parameters
m <- 10000
nfoldsI <- 3
folds <- sample(1:nfoldsI, m, replace = TRUE)
adjustment_type <- "BH"
Now we create the betamix sample data according to Section 5.2 with two-dimensional covariates
mus_slope <- 1.5
prob_one_sided <- 0.25
Xs <- matrix(runif(m * 2, 0, 1), ncol = 2)
colnames(Xs) <- c("X1", "X2")
pi1s <- ifelse(Xs[, 1]^2 + Xs[, 2]^2 <= 1, 0.02, 0.4)
mus <- pmax(1.3, sqrt(Xs) %*% c(1, 1) * mus_slope)
mu_alphas <- 1 / mus
Hs <- stats::rbinom(m, size = 1, prob = pi1s)
Ps <- stats::runif(m) * (1 - Hs) + stats::rbeta(m, mu_alphas, 1) * Hs
To run regular IHW, we need to manually bin the covariates in 2d squares
bins1d <- seq(from = 0, to = 1, length.out = 6)
Xs_binned <- data.frame(
X1 = cut(Xs[, 1], bins1d),
X2 = cut(Xs[, 2], bins1d)
)
Xs_binned <- apply(Xs_binned, 1, function(row) {
factor(paste(row[[1]], row[[2]], sep = "*"))
})
We run the normal IHW with the binned covariates
ihw_no_forest <- ihw(Ps, Xs_binned, alpha = .1, folds = folds, use_forest = FALSE, adjustment_type = "BH", lp_solver = "lpsymphony", lambda = Inf)
Now we run the IHW Forest with 100 trees. For the Boca Leek tree we need to provide the censoring parameter tau. The exact value of tau should not be too important. We will revisit this issue in a later stage of the project.
tau <- 0.7
ihw_forest <- ihw(Ps, Xs, alpha = .1, folds = folds, use_forest = TRUE, ntrees = 100, tau = tau, lp_solver = "lpsymphony", lambda = Inf)
Lets have a look at power and FDP. Note that as Nikos, we are only using one MonteCarlo replicate. So the FDP should be interpreted with caution. However, I am confident that I am not violating any assumptions of the theorems (took special care of that), so FDR control should be guaranteed from a theoretical point of view. We see that for this example, that the forest structure increases power considerably.
fdp_eval_no_forest <- fdp_eval(Hs, IHW::rejected_hypotheses(ihw_no_forest))
fdp_eval_forest <- fdp_eval(Hs, IHW::rejected_hypotheses(ihw_forest))
rbind(
manually_cut = fdp_eval_no_forest,
use_forest = fdp_eval_forest
)
## rjs pow FDP FWER
## manually_cut 89 0.07948970 0.08988764 TRUE
## use_forest 94 0.08243376 0.10638298 TRUE
For completeness, let’s plot the weights in 2d as in Nikos’ draft of proof pdf
data <- data.frame(Xs, weight = ihw_no_forest@df[["weight"]])
ggplot(data = data, aes(x = X1, y = X2, color = weight)) +
geom_point()
data <- data.frame(Xs, weight = ihw_forest@df[["weight"]])
ggplot(data = data, aes(x = X1, y = X2, color = weight)) +
geom_point()